20 research outputs found

    Solving Limited Memory Influence Diagrams

    Full text link
    We present a new algorithm for exactly solving decision making problems represented as influence diagrams. We do not require the usual assumptions of no forgetting and regularity; this allows us to solve problems with simultaneous decisions and limited information. The algorithm is empirically shown to outperform a state-of-the-art algorithm on randomly generated problems of up to 150 variables and 106410^{64} solutions. We show that the problem is NP-hard even if the underlying graph structure of the problem has small treewidth and the variables take on a bounded number of states, but that a fully polynomial time approximation scheme exists for these cases. Moreover, we show that the bound on the number of states is a necessary condition for any efficient approximation scheme.Comment: 43 pages, 8 figure

    Integrating question answering and text-to-SQL in Portuguese

    Full text link
    Deep learning transformers have drastically improved systems that automatically answer questions in natural language. However, different questions demand different answering techniques; here we propose, build and validate an architecture that integrates different modules to answer two distinct kinds of queries. Our architecture takes a free-form natural language text and classifies it to send it either to a Neural Question Answering Reasoner or a Natural Language parser to SQL. We implemented a complete system for the Portuguese language, using some of the main tools available for the language and translating training and testing datasets. Experiments show that our system selects the appropriate answering method with high accuracy (over 99\%), thus validating a modular question answering strategy.Comment: Published at International Conference on the Computational Processing of Portuguese (PROPOR 2022

    Topic models in user review automatic classification.

    No full text
    Existe um grande número de resenhas de usuário na internet contendo valiosas informações sobre serviços, produtos, política e tendências. A compreensão automática dessas opiniões é não somente cientificamente interessante, mas potencialmente lucrativa. A tarefa de classificação de sentimentos visa a extração automática das opiniões expressas em documentos de texto. Diferentemente da tarefa mais tradicional de categorização de textos, na qual documentos são classificados em assuntos como esportes, economia e turismo, a classificação de sentimentos consiste em anotar documentos com os sentimentos expressos no texto. Se comparados aos classificadores tradicionais, os classificadores de sentimentos possuem um desempenho insatisfatório. Uma das possíveis causas do baixo desempenho é a ausência de representações adequadas que permitam a discriminação das opiniões expressas de uma forma concisa e própria para o processamento de máquina. Modelos de tópicos são modelos estatísticos que buscam extrair informações semânticas ocultas na grande quantidade de dados presente em coleções de texto. Eles representam um documento como uma mistura de tópicos, onde cada tópico é uma distribuição de probabilidades sobre palavras. Cada distribuição representa um conceito semântico implícito nos dados. Modelos de tópicos, as palavras são substituídas por tópicos que representam seu significado de forma sucinta. De fato, os modelos de tópicos realizam uma redução de dimensionalidade nos dados que pode levar a um aumento do desempenho das técnicas de categorização de texto e recuperação de informação. Na classificação de sentimentos, eles podem fornecer a representação necessária através da extração de tópicos que representem os sentimentos expressos no texto. Este trabalho dedica-se ao estudo da aplicação de modelos de tópicos na representação e classificação de sentimentos de resenhas de usuário. Em particular, o modelo Latent Dirichlet Allocation (LDA) e quatro extensões (duas delas desenvolvidas pelo autor) são avaliados na tarefa de classificação de sentimentos baseada em múltiplos aspectos. As extensões ao modelo LDA permitem uma investigação dos efeitos da incorporação de informações adicionais como contexto, avaliações de aspecto e avaliações de múltiplos aspectos no modelo original.There is a large number of user reviews on the internet with valuable information on services, products, politics and trends. There is both scientific and economic interest in the automatic understanding of such data. Sentiment classification is concerned with automatic extraction of opinions expressed in user reviews. Unlike standard text categorization tasks that deal with the classification of documents into subjects such as sports, economics and tourism, sentiment classification attempts to tag documents with respect to the feelings they express. Compared to the accuracy of standard methods, sentiment classifiers have shown poor performance. One possible cause of such a poor performance is the lack of adequate representations that lead to opinion discrimination in a concise and machine-readable form. Topic Models are statistical models concerned with the extraction of semantic information hidden in the large number of data available in text collections. They represent a document as a mixture of topics, probability distributions over words that represent a semantic concept. According to Topic Model representation, words can be substituted by topics able to represent concisely its meaning. Indeed, Topic Models perform a data dimensionality reduction that can improve the performance of text classification and information retrieval techniques. In sentiment classification, they can provide the necessary representation by extracting topics that represent the general feelings expressed in text. This work presents a study of the use of Topic Models for representing and classifying user reviews with respect to their feelings. In particular, the Latent Dirichlet Allocation (LDA) model and four extensions (two of them developed by the author) are evaluated on the task of aspect-based sentiment classification. The extensions to the LDA model enables us to investigate the effects of the incorporation of additional information such as context, aspect rating and multiple aspect rating into the original model

    Topic models in user review automatic classification.

    No full text
    Existe um grande número de resenhas de usuário na internet contendo valiosas informações sobre serviços, produtos, política e tendências. A compreensão automática dessas opiniões é não somente cientificamente interessante, mas potencialmente lucrativa. A tarefa de classificação de sentimentos visa a extração automática das opiniões expressas em documentos de texto. Diferentemente da tarefa mais tradicional de categorização de textos, na qual documentos são classificados em assuntos como esportes, economia e turismo, a classificação de sentimentos consiste em anotar documentos com os sentimentos expressos no texto. Se comparados aos classificadores tradicionais, os classificadores de sentimentos possuem um desempenho insatisfatório. Uma das possíveis causas do baixo desempenho é a ausência de representações adequadas que permitam a discriminação das opiniões expressas de uma forma concisa e própria para o processamento de máquina. Modelos de tópicos são modelos estatísticos que buscam extrair informações semânticas ocultas na grande quantidade de dados presente em coleções de texto. Eles representam um documento como uma mistura de tópicos, onde cada tópico é uma distribuição de probabilidades sobre palavras. Cada distribuição representa um conceito semântico implícito nos dados. Modelos de tópicos, as palavras são substituídas por tópicos que representam seu significado de forma sucinta. De fato, os modelos de tópicos realizam uma redução de dimensionalidade nos dados que pode levar a um aumento do desempenho das técnicas de categorização de texto e recuperação de informação. Na classificação de sentimentos, eles podem fornecer a representação necessária através da extração de tópicos que representem os sentimentos expressos no texto. Este trabalho dedica-se ao estudo da aplicação de modelos de tópicos na representação e classificação de sentimentos de resenhas de usuário. Em particular, o modelo Latent Dirichlet Allocation (LDA) e quatro extensões (duas delas desenvolvidas pelo autor) são avaliados na tarefa de classificação de sentimentos baseada em múltiplos aspectos. As extensões ao modelo LDA permitem uma investigação dos efeitos da incorporação de informações adicionais como contexto, avaliações de aspecto e avaliações de múltiplos aspectos no modelo original.There is a large number of user reviews on the internet with valuable information on services, products, politics and trends. There is both scientific and economic interest in the automatic understanding of such data. Sentiment classification is concerned with automatic extraction of opinions expressed in user reviews. Unlike standard text categorization tasks that deal with the classification of documents into subjects such as sports, economics and tourism, sentiment classification attempts to tag documents with respect to the feelings they express. Compared to the accuracy of standard methods, sentiment classifiers have shown poor performance. One possible cause of such a poor performance is the lack of adequate representations that lead to opinion discrimination in a concise and machine-readable form. Topic Models are statistical models concerned with the extraction of semantic information hidden in the large number of data available in text collections. They represent a document as a mixture of topics, probability distributions over words that represent a semantic concept. According to Topic Model representation, words can be substituted by topics able to represent concisely its meaning. Indeed, Topic Models perform a data dimensionality reduction that can improve the performance of text classification and information retrieval techniques. In sentiment classification, they can provide the necessary representation by extracting topics that represent the general feelings expressed in text. This work presents a study of the use of Topic Models for representing and classifying user reviews with respect to their feelings. In particular, the Latent Dirichlet Allocation (LDA) model and four extensions (two of them developed by the author) are evaluated on the task of aspect-based sentiment classification. The extensions to the LDA model enables us to investigate the effects of the incorporation of additional information such as context, aspect rating and multiple aspect rating into the original model

    Algorithms and complexity results for discrete probabilistic reasoning tasks

    Get PDF
    Many solutions to problems in machine learning and artificial intelligence involve solving a combinatorial optimization problem over discrete variables whose functional dependence is conveniently represented by a graph. This thesis addresses three types of these combinatorial optimization problems, namely, the maximum a posteriori inference in discrete probabilistic graphical models, the selection of optimal strategies for limited memory influence diagrams, and the computation of upper and lower probability bounds in credal networks.These three problems arise out of seemingly very different situations, and one might believe that they share no more than the graph-based specification of their inputs or the underlying probabilistic treatment of uncertainty. However, correspondences among instances of these problems have long been noticed in the literature. For instance, the computation of probability bounds in credal networks can be reduced either to the problem of maximum a posteriori inference in graphical models, or to the selection of optimal strategies in limited memory influence diagrams. Conversely, both the maximum a posteriori inference and the strategy selection problems can be reduced to the computation of a probability bound in a credal network. These reductions suggest that much insight can be gained by carrying out a joint study of the practical and theoretical computational complexity of these three problems. This thesis describes algorithms and complexity results for these three classes of problems. In particular, we develop a new anytime algorithm for the maximum a posteriori problem. Not only the algorithm is of practical relevance, as we show that it compares favorably against a state-of-the-art method, but it is the base of the proof of polynomial-time approximability of the two other problems. We characterize the tractability of the strategy selection problem according to the input parameters, and we show that the strategy selection problem can be solved in polynomial time in singly connected diagrams over binary variables and univariate utility functions, and that relaxing any of these assumptions makes the problem NP-hard to solve or even approximate within any bound. We also investigate the theoretical complexity of computing upper and lower probability bounds in credal networks. We show that the complexity of the problem depends on the irrelevance concept adopted, but is in general NP-hard even in polytree-shaped networks, and even in trees if we assume strong independence. We also show that there is a particular type of inference that can be solved in polynomial time in imprecise hidden Markov models, whether we assume epistemic irrelevance or strong independence

    Timing to intubation COVID-19 patients: can we put it off until tomorrow?

    Get PDF
    Background: The decision to intubate COVID-19 patients receiving non-invasive respiratory support is challenging, requiring a fine balance between early intubation and risks of invasive mechanical ventilation versus the adverse effects of delaying intubation. Objective: Analyze the relationship between intubation day and mortality in COVID-19 patients. Methods: A unicentric retrospective cohort study considering all adult laboratory-confirmed SARS-CoV-2 infection consecutively admitted at a tertiary hospital between March 2020 and August 2020 requiring invasive mechanical ventilation. The primary outcome was all-cause mortality within 28 days after intubation, and a Cox model was used to evaluate the effect of time from onset of symptoms to intubation in mortality Results: A total of 592 (20%) adult consecutive patients out of 3020 admitted with COVID-19 were intubated during the study period. The median time from admission to intubation was one day (interquartile range, 0-3), and 310 patients (52%) who were intubated and mechanically ventilated deceased 28 days after intubation. Each additional day between the onset of symptoms and intubation was significantly associated with higher in-hospital death (adjusted hazard ratio, 1.018; 95% CI, 1.005-1.03). Conclusion: Among patients infected with SARS-CoV-2 who were intubated and mechanically ventilated, delaying intubation in the course of symptoms may be associated with higher mortality. &nbsp

    Robustifying Sum-Product Networks

    No full text
    Sum-product networks are a relatively new and increasingly popular family of probabilistic graphical models that allow for marginal inference with polynomial effort. They have been shown to achieve state-of-the-art performance in several tasks involving density estimation. Sum-product networks are typically learned from data; as such, inferences produced with them are prone to be unreliable and overconfident when data is scarce. In this work, we develop the credal sum-product networks, a generalization of sum-product networks that uses set-valued parameters. We present algorithms and complexity results for common inference tasks with this class of models. We also present an approach for assessing the reliability of classifications made with sum-product networks. We apply this approach on benchmark classification tasks as well as a new application in predicting the age of stars. Our experiments show that the use of credal sum-product networks allow us to distinguish between reliable and unreliable classifications with higher accuracy than standard approaches based on (precise) probability values
    corecore